首页> 外文OA文献 >Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics
【2h】

Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics

机译:用于多智能体强化学习的无标度记忆模型。意思   场近似和岩石剪刀动力学

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A continuous time model for multiagent systems governed by reinforcementlearning with scale-free memory is developed. The agents are assumed to actindependently of one another in optimizing their choice of possible actions viatrial-and-error search. To gain awareness about the action value the agentsaccumulate in their memory the rewards obtained from taking a specific actionat each moment of time. The contribution of the rewards in the past to theagent current perception of action value is described by an integral operatorwith a power-law kernel. Finally a fractional differential equation governingthe system dynamics is obtained. The agents are considered to interact with oneanother implicitly via the reward of one agent depending on the choice of theother agents. The pairwise interaction model is adopted to describe thiseffect. As a specific example of systems with non-transitive interactions, atwo agent and three agent systems of the rock-paper-scissors type are analyzedin detail, including the stability analysis and numerical simulation.Scale-free memory is demonstrated to cause complex dynamics of the systems athand. In particular, it is shown that there can be simultaneously two modes ofthe system instability undergoing subcritical and supercritical bifurcation,with the latter one exhibiting anomalous oscillations with the amplitude andperiod growing with time. Besides, the instability onset via this supercriticalmode may be regarded as "altruism self-organization". For the three agentsystem the instability dynamics is found to be rather irregular and can becomposed of alternate fragments of oscillations different in their properties.
机译:建立了具有无标度记忆的强化学习控制的多主体系统的连续时间模型。假定代理人通过试验和错误搜索在优化他们对可能动作的选择方面彼此独立行动。为了获得对行动价值的认识,特工在他们的记忆中积累了在每个时刻采取特定行动所获得的回报。过去奖励对代理当前对动作值的感知的贡献由具有幂律内核的积分算子描述。最后,获得了控制系统动力学的分数阶微分方程。依赖于另一代理的选择,认为代理通过一个代理的报酬隐式地与另一个交互。采用成对交互模型来描述这种效果。作为具有非传递相互作用的系统的一个特定示例,详细分析了剪刀石头布类型的两种媒介和三种媒介系统,包括稳定性分析和数值模拟。无标度存储被证明会导致复杂的动力学过程。系统。特别地,显示出系统不稳定同时发生亚临界和超临界分叉的两种模式,后者表现出异常振荡,其振幅和周期随时间增长。此外,通过这种超临界模式发生的不稳定性可以被认为是“利他主义的自我组织”。对于三主体系统,发现不稳定动力学是相当不规则的,并且可以由性质不同的交替振动碎片组成。

著录项

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号